Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

نویسندگان

  • Yin Zhao
  • Yahya Abu Hasan
چکیده

Pollutant forecasting is an important problem in the environmental sciences. Data mining is an approach to discover knowledge from large data. This paper tries to use data mining methods to forecast concentration level, which is an important air pollutant. There are several tree-based classification algorithms available in data mining, such as CART, C4.5, Random Forest (RF) and C5.0. RF and C5.0 are popular ensemble methods, which are, RF builds on CART with Bagging and C5.0 builds on C4.5 with Boosting, respectively. This paper builds concentration level predictive models based on RF and C5.0 by using R packages. The data set includes 2000-2011 period data in a new town of Hong Kong. The concentration is divided into 2 levels, the critical points is 25μg/ (24 hours mean). According to 100 times 10-fold cross validation, the best testing accuracy is from RF model, which is around 0.845~0.854. Keywords—Random Forest; C5.0; PM2.5 prediction; data mining.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Machine learning algorithms for predicting roadside fine particulate matter concentration level in Hong Kong Central

Data mining is an approach to discover knowledge from large data. Pollutant forecasting is an important problem in the environmental sciences. This paper tries to use data mining methods to forecast fine particles (PM2.5) concentration level in Hong Kong Central, which is a famous business centre in Asia. There are several classification algorithms available in data mining, such as Artificial N...

متن کامل

Artificial neural network forecast application for fine particulate matter concentration using meteorological data

Most parts of the urban areas are faced with the problem of floating fine particulate matter. Therefore, it is crucial to estimate the amounts of fine particulate matter concentrations through the urban atmosphere. In this research, an artificial neural network technique was utilized to model the PM2.5 dispersion in Tehran City. Factors which are influencing the predicted value consi...

متن کامل

Machine learning algorithms in air quality modeling

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affec...

متن کامل

zoning of flood hazard in Nowshahr city using machine learning models

  The aim of this study is to predict and model flood hazard in the city of Nowshahr, Mazandaran province using machine learning models. The criteria and indicators affecting flood hazard were identified based on the review of resources, and then the indicators were converted into rasters in ArcGIS environment, and finally standardized by fuzzy method for use in the models. K-nearest neighbor ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013